Overview
Brought to you by YData
Dataset statistics
| Number of variables | 25 |
|---|---|
| Number of observations | 4190 |
| Missing cells | 3278 |
| Missing cells (%) | 3.1% |
| Duplicate rows | 4 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 851.1 KiB |
| Average record size in memory | 208.0 B |
Variable types
| Numeric | 25 |
|---|
| Dataset has 4 (0.1%) duplicate rows | Duplicates |
GOODS_DESCRIPTION_len_chars_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_mean and 13 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_mean is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 7 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_median is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 4 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_sum and 6 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_std is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 5 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 14 other fields | High correlation |
GOODS_DESCRIPTION_len_words_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 12 other fields | High correlation |
GOODS_DESCRIPTION_len_words_mean is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
GOODS_DESCRIPTION_len_words_median is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 4 other fields | High correlation |
GOODS_DESCRIPTION_len_words_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_min and 6 other fields | High correlation |
GOODS_DESCRIPTION_len_words_std is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 6 other fields | High correlation |
GOODS_DESCRIPTION_len_words_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 12 other fields | High correlation |
HS06_count is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 10 other fields | High correlation |
cosine_sim_gd_vs_hs_text_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 4 other fields | High correlation |
cosine_sim_gd_vs_hs_text_mean is highly overall correlated with cosine_sim_gd_vs_hs_text_max and 2 other fields | High correlation |
cosine_sim_gd_vs_hs_text_median is highly overall correlated with cosine_sim_gd_vs_hs_text_max and 2 other fields | High correlation |
cosine_sim_gd_vs_hs_text_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_min and 7 other fields | High correlation |
cosine_sim_gd_vs_hs_text_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 10 other fields | High correlation |
subtokenization_indicator_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 9 other fields | High correlation |
subtokenization_indicator_mean is highly overall correlated with subtokenization_indicator_max and 2 other fields | High correlation |
subtokenization_indicator_median is highly overall correlated with subtokenization_indicator_mean | High correlation |
subtokenization_indicator_min is highly overall correlated with GOODS_DESCRIPTION_len_words_sum and 1 other fields | High correlation |
subtokenization_indicator_std is highly overall correlated with subtokenization_indicator_max and 1 other fields | High correlation |
subtokenization_indicator_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 9 other fields | High correlation |
GOODS_DESCRIPTION_len_words_std has 540 (12.9%) missing values | Missing |
GOODS_DESCRIPTION_len_chars_std has 540 (12.9%) missing values | Missing |
subtokenization_indicator_std has 540 (12.9%) missing values | Missing |
cosine_sim_gd_vs_hs_text_min has 230 (5.5%) missing values | Missing |
cosine_sim_gd_vs_hs_text_mean has 230 (5.5%) missing values | Missing |
cosine_sim_gd_vs_hs_text_median has 230 (5.5%) missing values | Missing |
cosine_sim_gd_vs_hs_text_max has 230 (5.5%) missing values | Missing |
cosine_sim_gd_vs_hs_text_std has 738 (17.6%) missing values | Missing |
GOODS_DESCRIPTION_len_words_std has 100 (2.4%) zeros | Zeros |
subtokenization_indicator_std has 106 (2.5%) zeros | Zeros |
cosine_sim_gd_vs_hs_text_sum has 230 (5.5%) zeros | Zeros |
Reproduction
| Analysis started | 2025-05-20 10:56:38.654908 |
|---|---|
| Analysis finished | 2025-05-20 11:00:25.357604 |
| Duration | 3 minutes and 46.7 seconds |
| Software version | ydata-profiling vv4.12.1 |
| Download configuration | config.json |
Variables
HS06_count
Real number (ℝ)
High correlation 
| Distinct | 430 |
|---|---|
| Distinct (%) | 10.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 63.909308 |
| Minimum | 1 |
|---|---|
| Maximum | 3869 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 13 |
| Q3 | 49 |
| 95-th percentile | 274 |
| Maximum | 3869 |
| Range | 3868 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 185.02558 |
|---|---|
| Coefficient of variation (CV) | 2.8951273 |
| Kurtosis | 105.46571 |
| Mean | 63.909308 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 8.4442706 |
| Sum | 267780 |
| Variance | 34234.467 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 540 | 12.9% |
| 2 | 319 | 7.6% |
| 3 | 233 | 5.6% |
| 4 | 176 | 4.2% |
| 5 | 158 | 3.8% |
| 7 | 137 | 3.3% |
| 6 | 124 | 3.0% |
| 8 | 113 | 2.7% |
| 9 | 101 | 2.4% |
| 13 | 83 | 2.0% |
| Other values (420) | 2206 |
| Value | Count | Frequency (%) |
| 1 | 540 | |
| 2 | 319 | |
| 3 | 233 | |
| 4 | 176 | 4.2% |
| 5 | 158 | 3.8% |
| 6 | 124 | 3.0% |
| 7 | 137 | 3.3% |
| 8 | 113 | 2.7% |
| 9 | 101 | 2.4% |
| 10 | 51 | 1.2% |
| Value | Count | Frequency (%) |
| 3869 | 1 | |
| 2924 | 1 | |
| 2779 | 1 | |
| 2632 | 1 | |
| 2247 | 1 | |
| 2074 | 1 | |
| 1988 | 1 | |
| 1874 | 1 | |
| 1857 | 1 | |
| 1836 | 1 |
GOODS_DESCRIPTION_len_words_sum
Real number (ℝ)
High correlation 
| Distinct | 907 |
|---|---|
| Distinct (%) | 21.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 293.24773 |
| Minimum | 1 |
|---|---|
| Maximum | 22795 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 12 |
| median | 52 |
| Q3 | 206 |
| 95-th percentile | 1246 |
| Maximum | 22795 |
| Range | 22794 |
| Interquartile range (IQR) | 194 |
Descriptive statistics
| Standard deviation | 939.78084 |
|---|---|
| Coefficient of variation (CV) | 3.2047335 |
| Kurtosis | 181.82486 |
| Mean | 293.24773 |
| Median Absolute Deviation (MAD) | 48 |
| Skewness | 10.765924 |
| Sum | 1228708 |
| Variance | 883188.02 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 197 | 4.7% |
| 3 | 139 | 3.3% |
| 4 | 109 | 2.6% |
| 6 | 101 | 2.4% |
| 5 | 98 | 2.3% |
| 1 | 83 | 2.0% |
| 7 | 81 | 1.9% |
| 9 | 65 | 1.6% |
| 8 | 64 | 1.5% |
| 11 | 56 | 1.3% |
| Other values (897) | 3197 |
| Value | Count | Frequency (%) |
| 1 | 83 | |
| 2 | 197 | |
| 3 | 139 | |
| 4 | 109 | |
| 5 | 98 | |
| 6 | 101 | |
| 7 | 81 | |
| 8 | 64 | 1.5% |
| 9 | 65 | 1.6% |
| 10 | 48 | 1.1% |
| Value | Count | Frequency (%) |
| 22795 | 1 | |
| 21639 | 1 | |
| 12896 | 1 | |
| 12530 | 1 | |
| 12241 | 1 | |
| 9676 | 1 | |
| 9353 | 1 | |
| 9042 | 1 | |
| 8818 | 1 | |
| 8680 | 1 |
GOODS_DESCRIPTION_len_words_min
Real number (ℝ)
High correlation 
| Distinct | 17 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8212411 |
| Minimum | 1 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 4 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.3782325 |
|---|---|
| Coefficient of variation (CV) | 0.75675458 |
| Kurtosis | 33.363193 |
| Mean | 1.8212411 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.4492171 |
| Sum | 7631 |
| Variance | 1.8995248 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2121 | |
| 2 | 1452 | |
| 3 | 333 | 7.9% |
| 4 | 128 | 3.1% |
| 5 | 54 | 1.3% |
| 6 | 40 | 1.0% |
| 7 | 20 | 0.5% |
| 9 | 11 | 0.3% |
| 8 | 10 | 0.2% |
| 10 | 9 | 0.2% |
| Other values (7) | 12 | 0.3% |
| Value | Count | Frequency (%) |
| 1 | 2121 | |
| 2 | 1452 | |
| 3 | 333 | 7.9% |
| 4 | 128 | 3.1% |
| 5 | 54 | 1.3% |
| 6 | 40 | 1.0% |
| 7 | 20 | 0.5% |
| 8 | 10 | 0.2% |
| 9 | 11 | 0.3% |
| 10 | 9 | 0.2% |
| Value | Count | Frequency (%) |
| 19 | 2 | < 0.1% |
| 18 | 1 | < 0.1% |
| 16 | 2 | < 0.1% |
| 15 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 13 | 2 | < 0.1% |
| 11 | 3 | 0.1% |
| 10 | 9 | |
| 9 | 11 | |
| 8 | 10 |
GOODS_DESCRIPTION_len_words_mean
Real number (ℝ)
High correlation 
| Distinct | 1692 |
|---|---|
| Distinct (%) | 40.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.0534472 |
| Minimum | 1 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 4 |
| Q3 | 4.7544714 |
| 95-th percentile | 6.6 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 1.7544714 |
Descriptive statistics
| Standard deviation | 1.6385532 |
|---|---|
| Coefficient of variation (CV) | 0.40423698 |
| Kurtosis | 10.007139 |
| Mean | 4.0534472 |
| Median Absolute Deviation (MAD) | 0.84680574 |
| Skewness | 1.8703389 |
| Sum | 16983.944 |
| Variance | 2.6848567 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 268 | 6.4% |
| 3 | 237 | 5.7% |
| 4 | 171 | 4.1% |
| 1 | 95 | 2.3% |
| 5 | 89 | 2.1% |
| 2.5 | 75 | 1.8% |
| 3.5 | 61 | 1.5% |
| 6 | 54 | 1.3% |
| 4.5 | 47 | 1.1% |
| 3.666666667 | 41 | 1.0% |
| Other values (1682) | 3052 |
| Value | Count | Frequency (%) |
| 1 | 95 | |
| 1.25 | 1 | < 0.1% |
| 1.333333333 | 8 | 0.2% |
| 1.5 | 38 | 0.9% |
| 1.6 | 2 | < 0.1% |
| 1.666666667 | 14 | 0.3% |
| 1.714285714 | 2 | < 0.1% |
| 1.75 | 5 | 0.1% |
| 1.8 | 5 | 0.1% |
| 1.833333333 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 19 | 2 | |
| 18 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| 16 | 2 | |
| 15 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 13.75 | 1 | < 0.1% |
| 13 | 4 | |
| 12.98501873 | 1 | < 0.1% |
| 12.25 | 1 | < 0.1% |
GOODS_DESCRIPTION_len_words_median
Real number (ℝ)
High correlation 
| Distinct | 31 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.6504773 |
| Minimum | 1 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 3.5 |
| Q3 | 4 |
| 95-th percentile | 6 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.6425555 |
|---|---|
| Coefficient of variation (CV) | 0.44995636 |
| Kurtosis | 12.897529 |
| Mean | 3.6504773 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 2.46108 |
| Sum | 15295.5 |
| Variance | 2.6979886 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 1187 | |
| 4 | 1100 | |
| 2 | 589 | |
| 5 | 370 | 8.8% |
| 3.5 | 169 | 4.0% |
| 2.5 | 151 | 3.6% |
| 6 | 141 | 3.4% |
| 1 | 119 | 2.8% |
| 4.5 | 90 | 2.1% |
| 7 | 70 | 1.7% |
| Other values (21) | 204 | 4.9% |
| Value | Count | Frequency (%) |
| 1 | 119 | 2.8% |
| 1.5 | 43 | 1.0% |
| 2 | 589 | |
| 2.5 | 151 | 3.6% |
| 3 | 1187 | |
| 3.5 | 169 | 4.0% |
| 4 | 1100 | |
| 4.5 | 90 | 2.1% |
| 5 | 370 | 8.8% |
| 5.5 | 29 | 0.7% |
| Value | Count | Frequency (%) |
| 19 | 2 | < 0.1% |
| 18 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| 16 | 3 | |
| 14 | 2 | < 0.1% |
| 13.5 | 1 | < 0.1% |
| 13 | 7 | |
| 12.5 | 1 | < 0.1% |
| 12 | 2 | < 0.1% |
| 11.5 | 2 | < 0.1% |
GOODS_DESCRIPTION_len_words_max
Real number (ℝ)
High correlation 
| Distinct | 36 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.6 |
| Minimum | 1 |
|---|---|
| Maximum | 41 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 8 |
| Q3 | 13 |
| 95-th percentile | 22 |
| Maximum | 41 |
| Range | 40 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 6.1978904 |
|---|---|
| Coefficient of variation (CV) | 0.64561359 |
| Kurtosis | 0.59564961 |
| Mean | 9.6 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.92257265 |
| Sum | 40224 |
| Variance | 38.413846 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 312 | 7.4% |
| 5 | 300 | 7.2% |
| 2 | 299 | 7.1% |
| 3 | 287 | 6.8% |
| 6 | 286 | 6.8% |
| 7 | 274 | 6.5% |
| 10 | 259 | 6.2% |
| 8 | 256 | 6.1% |
| 11 | 237 | 5.7% |
| 9 | 236 | 5.6% |
| Other values (26) | 1444 |
| Value | Count | Frequency (%) |
| 1 | 95 | 2.3% |
| 2 | 299 | |
| 3 | 287 | |
| 4 | 312 | |
| 5 | 300 | |
| 6 | 286 | |
| 7 | 274 | |
| 8 | 256 | |
| 9 | 236 | |
| 10 | 259 |
| Value | Count | Frequency (%) |
| 41 | 1 | < 0.1% |
| 37 | 2 | < 0.1% |
| 34 | 1 | < 0.1% |
| 33 | 1 | < 0.1% |
| 32 | 3 | 0.1% |
| 31 | 4 | 0.1% |
| 30 | 7 | |
| 29 | 7 | |
| 28 | 14 | |
| 27 | 11 |
GOODS_DESCRIPTION_len_words_std
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 2713 |
|---|---|
| Distinct (%) | 74.3% |
| Missing | 540 |
| Missing (%) | 12.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.2447975 |
| Minimum | 0 |
|---|---|
| Maximum | 10.843585 |
| Zeros | 100 |
| Zeros (%) | 2.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.57735027 |
| Q1 | 1.4142136 |
| median | 2.1247171 |
| Q3 | 2.8739789 |
| 95-th percentile | 4.3664968 |
| Maximum | 10.843585 |
| Range | 10.843585 |
| Interquartile range (IQR) | 1.4597653 |
Descriptive statistics
| Standard deviation | 1.2246282 |
|---|---|
| Coefficient of variation (CV) | 0.54554062 |
| Kurtosis | 3.9982845 |
| Mean | 2.2447975 |
| Median Absolute Deviation (MAD) | 0.71050355 |
| Skewness | 1.2127068 |
| Sum | 8193.5108 |
| Variance | 1.4997142 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.7071067812 | 117 | 2.8% |
| 0 | 100 | 2.4% |
| 1.414213562 | 66 | 1.6% |
| 0.5773502692 | 40 | 1.0% |
| 2.121320344 | 39 | 0.9% |
| 1 | 36 | 0.9% |
| 0.5773502692 | 25 | 0.6% |
| 1.154700538 | 22 | 0.5% |
| 0.9574271078 | 17 | 0.4% |
| 2.828427125 | 16 | 0.4% |
| Other values (2703) | 3172 | |
| (Missing) | 540 | 12.9% |
| Value | Count | Frequency (%) |
| 0 | 100 | |
| 0.2717464882 | 1 | < 0.1% |
| 0.3755338081 | 1 | < 0.1% |
| 0.377964473 | 1 | < 0.1% |
| 0.377964473 | 1 | < 0.1% |
| 0.4082482905 | 2 | < 0.1% |
| 0.4082482905 | 1 | < 0.1% |
| 0.4409585518 | 1 | < 0.1% |
| 0.4472135955 | 1 | < 0.1% |
| 0.4472135955 | 5 | 0.1% |
| Value | Count | Frequency (%) |
| 10.84358489 | 1 | |
| 10.44030651 | 1 | |
| 9.812528435 | 1 | |
| 9.192388155 | 2 | |
| 8.986100378 | 1 | |
| 8.845903006 | 1 | |
| 8.485281374 | 1 | |
| 8.354615002 | 1 | |
| 8.185352772 | 1 | |
| 8.082903769 | 1 |
GOODS_DESCRIPTION_len_chars_sum
Real number (ℝ)
High correlation 
| Distinct | 1888 |
|---|---|
| Distinct (%) | 45.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1867.648 |
| Minimum | 3 |
|---|---|
| Maximum | 167738 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 75 |
| median | 334 |
| Q3 | 1324.75 |
| 95-th percentile | 7973.2 |
| Maximum | 167738 |
| Range | 167735 |
| Interquartile range (IQR) | 1249.75 |
Descriptive statistics
| Standard deviation | 6206.2527 |
|---|---|
| Coefficient of variation (CV) | 3.3230313 |
| Kurtosis | 247.35778 |
| Mean | 1867.648 |
| Median Absolute Deviation (MAD) | 306 |
| Skewness | 12.426205 |
| Sum | 7825445 |
| Variance | 38517572 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 23 | 34 | 0.8% |
| 13 | 32 | 0.8% |
| 11 | 31 | 0.7% |
| 14 | 31 | 0.7% |
| 12 | 28 | 0.7% |
| 10 | 26 | 0.6% |
| 19 | 25 | 0.6% |
| 25 | 24 | 0.6% |
| 26 | 23 | 0.5% |
| 16 | 22 | 0.5% |
| Other values (1878) | 3914 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 16 | |
| 5 | 12 | 0.3% |
| 6 | 11 | 0.3% |
| 7 | 13 | |
| 8 | 13 | |
| 9 | 19 | |
| 10 | 26 | |
| 11 | 31 | |
| 12 | 28 |
| Value | Count | Frequency (%) |
| 167738 | 1 | |
| 156074 | 1 | |
| 82898 | 1 | |
| 82038 | 1 | |
| 73835 | 1 | |
| 64804 | 1 | |
| 64331 | 1 | |
| 59853 | 1 | |
| 55286 | 1 | |
| 53641 | 1 |
GOODS_DESCRIPTION_len_chars_min
Real number (ℝ)
High correlation 
| Distinct | 72 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.419093 |
| Minimum | 2 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 6 |
| median | 9 |
| Q3 | 13 |
| 95-th percentile | 26 |
| Maximum | 150 |
| Range | 148 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 9.3133826 |
|---|---|
| Coefficient of variation (CV) | 0.8155974 |
| Kurtosis | 32.335824 |
| Mean | 11.419093 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 4.2796572 |
| Sum | 47846 |
| Variance | 86.739096 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 417 | 10.0% |
| 7 | 355 | 8.5% |
| 8 | 341 | 8.1% |
| 5 | 334 | 8.0% |
| 9 | 333 | 7.9% |
| 10 | 312 | 7.4% |
| 11 | 274 | 6.5% |
| 4 | 264 | 6.3% |
| 12 | 224 | 5.3% |
| 13 | 178 | 4.2% |
| Other values (62) | 1158 |
| Value | Count | Frequency (%) |
| 2 | 15 | 0.4% |
| 3 | 148 | 3.5% |
| 4 | 264 | |
| 5 | 334 | |
| 6 | 417 | |
| 7 | 355 | |
| 8 | 341 | |
| 9 | 333 | |
| 10 | 312 | |
| 11 | 274 |
| Value | Count | Frequency (%) |
| 150 | 1 | |
| 102 | 2 | |
| 100 | 1 | |
| 99 | 2 | |
| 97 | 1 | |
| 88 | 1 | |
| 84 | 1 | |
| 83 | 1 | |
| 80 | 1 | |
| 78 | 1 |
GOODS_DESCRIPTION_len_chars_mean
Real number (ℝ)
High correlation 
| Distinct | 2436 |
|---|---|
| Distinct (%) | 58.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.040292 |
| Minimum | 3 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 19.930159 |
| median | 25.10156 |
| Q3 | 30.746164 |
| 95-th percentile | 43.251026 |
| Maximum | 150 |
| Range | 147 |
| Interquartile range (IQR) | 10.816006 |
Descriptive statistics
| Standard deviation | 10.620201 |
|---|---|
| Coefficient of variation (CV) | 0.40783726 |
| Kurtosis | 10.939805 |
| Mean | 26.040292 |
| Median Absolute Deviation (MAD) | 5.3984396 |
| Skewness | 1.9069107 |
| Sum | 109108.82 |
| Variance | 112.78868 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 47 | 1.1% |
| 14 | 41 | 1.0% |
| 15 | 41 | 1.0% |
| 19 | 40 | 1.0% |
| 24 | 38 | 0.9% |
| 21 | 37 | 0.9% |
| 11 | 37 | 0.9% |
| 10 | 37 | 0.9% |
| 25 | 36 | 0.9% |
| 23 | 36 | 0.9% |
| Other values (2426) | 3800 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 16 | |
| 4.5 | 4 | 0.1% |
| 5 | 13 | |
| 5.666666667 | 1 | < 0.1% |
| 6 | 12 | |
| 6.5 | 2 | < 0.1% |
| 7 | 18 | |
| 7.333333333 | 2 | < 0.1% |
| 7.5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 150 | 1 | |
| 104 | 1 | |
| 103.25 | 1 | |
| 102 | 1 | |
| 100 | 1 | |
| 99 | 2 | |
| 97 | 1 | |
| 88 | 1 | |
| 86.68913858 | 1 | |
| 86.66666667 | 1 |
GOODS_DESCRIPTION_len_chars_median
Real number (ℝ)
High correlation 
| Distinct | 149 |
|---|---|
| Distinct (%) | 3.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.718138 |
| Minimum | 3 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 18 |
| median | 22.5 |
| Q3 | 27.5 |
| 95-th percentile | 41 |
| Maximum | 150 |
| Range | 147 |
| Interquartile range (IQR) | 9.5 |
Descriptive statistics
| Standard deviation | 10.587122 |
|---|---|
| Coefficient of variation (CV) | 0.4463724 |
| Kurtosis | 14.599156 |
| Mean | 23.718138 |
| Median Absolute Deviation (MAD) | 4.5 |
| Skewness | 2.569964 |
| Sum | 99379 |
| Variance | 112.08716 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 22 | 198 | 4.7% |
| 24 | 196 | 4.7% |
| 20 | 195 | 4.7% |
| 21 | 191 | 4.6% |
| 23 | 186 | 4.4% |
| 25 | 170 | 4.1% |
| 19 | 169 | 4.0% |
| 26 | 158 | 3.8% |
| 18 | 158 | 3.8% |
| 27 | 152 | 3.6% |
| Other values (139) | 2417 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 17 | |
| 4.5 | 4 | 0.1% |
| 5 | 14 | |
| 5.5 | 1 | < 0.1% |
| 6 | 16 | |
| 6.5 | 2 | < 0.1% |
| 7 | 21 | |
| 7.5 | 1 | < 0.1% |
| 8 | 17 |
| Value | Count | Frequency (%) |
| 150 | 1 | |
| 104 | 1 | |
| 102 | 1 | |
| 100 | 1 | |
| 99 | 2 | |
| 97 | 2 | |
| 95.5 | 1 | |
| 93.5 | 1 | |
| 93 | 1 | |
| 90 | 1 |
GOODS_DESCRIPTION_len_chars_max
Real number (ℝ)
High correlation 
| Distinct | 148 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60.268974 |
| Minimum | 3 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 30 |
| median | 53 |
| Q3 | 84 |
| 95-th percentile | 143 |
| Maximum | 150 |
| Range | 147 |
| Interquartile range (IQR) | 54 |
Descriptive statistics
| Standard deviation | 37.391868 |
|---|---|
| Coefficient of variation (CV) | 0.62041654 |
| Kurtosis | -0.23455498 |
| Mean | 60.268974 |
| Median Absolute Deviation (MAD) | 27 |
| Skewness | 0.71280785 |
| Sum | 252527 |
| Variance | 1398.1518 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 150 | 134 | 3.2% |
| 100 | 114 | 2.7% |
| 80 | 83 | 2.0% |
| 26 | 63 | 1.5% |
| 32 | 63 | 1.5% |
| 46 | 62 | 1.5% |
| 30 | 60 | 1.4% |
| 23 | 59 | 1.4% |
| 31 | 55 | 1.3% |
| 33 | 53 | 1.3% |
| Other values (138) | 3444 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 16 | 0.4% |
| 5 | 16 | 0.4% |
| 6 | 12 | 0.3% |
| 7 | 16 | 0.4% |
| 8 | 16 | 0.4% |
| 9 | 17 | 0.4% |
| 10 | 38 | |
| 11 | 40 | |
| 12 | 46 |
| Value | Count | Frequency (%) |
| 150 | 134 | |
| 149 | 35 | 0.8% |
| 148 | 9 | 0.2% |
| 147 | 11 | 0.3% |
| 146 | 1 | < 0.1% |
| 145 | 6 | 0.1% |
| 144 | 7 | 0.2% |
| 143 | 11 | 0.3% |
| 142 | 6 | 0.1% |
| 141 | 5 | 0.1% |
GOODS_DESCRIPTION_len_chars_std
Real number (ℝ)
High correlation  Missing 
| Distinct | 3254 |
|---|---|
| Distinct (%) | 89.2% |
| Missing | 540 |
| Missing (%) | 12.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.086485 |
| Minimum | 0 |
|---|---|
| Maximum | 70.741313 |
| Zeros | 18 |
| Zeros (%) | 0.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3.2307911 |
| Q1 | 8.8666728 |
| median | 13.233888 |
| Q3 | 18.211586 |
| 95-th percentile | 27.577164 |
| Maximum | 70.741313 |
| Range | 70.741313 |
| Interquartile range (IQR) | 9.3449129 |
Descriptive statistics
| Standard deviation | 7.7478912 |
|---|---|
| Coefficient of variation (CV) | 0.55002304 |
| Kurtosis | 3.3945739 |
| Mean | 14.086485 |
| Median Absolute Deviation (MAD) | 4.5996349 |
| Skewness | 1.1688069 |
| Sum | 51415.669 |
| Variance | 60.029819 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.7071067812 | 37 | 0.9% |
| 1.414213562 | 22 | 0.5% |
| 4.949747468 | 18 | 0.4% |
| 3.535533906 | 18 | 0.4% |
| 0 | 18 | 0.4% |
| 4.242640687 | 18 | 0.4% |
| 2.828427125 | 17 | 0.4% |
| 2.121320344 | 17 | 0.4% |
| 6.363961031 | 16 | 0.4% |
| 9.192388155 | 13 | 0.3% |
| Other values (3244) | 3456 | |
| (Missing) | 540 | 12.9% |
| Value | Count | Frequency (%) |
| 0 | 18 | |
| 0.5 | 1 | < 0.1% |
| 0.5773502692 | 1 | < 0.1% |
| 0.5773502692 | 1 | < 0.1% |
| 0.5773502692 | 1 | < 0.1% |
| 0.5773502692 | 3 | 0.1% |
| 0.7071067812 | 37 | |
| 0.9831920803 | 1 | < 0.1% |
| 1 | 2 | < 0.1% |
| 1.10194633 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 70.74131278 | 1 | |
| 66.50563886 | 1 | |
| 58.39777393 | 1 | |
| 56.5714887 | 1 | |
| 53.913787 | 1 | |
| 53.03300859 | 1 | |
| 51.83338692 | 1 | |
| 50.52062285 | 1 | |
| 49.17387224 | 1 | |
| 48.49742261 | 1 |
subtokenization_indicator_sum
Real number (ℝ)
High correlation 
| Distinct | 3045 |
|---|---|
| Distinct (%) | 72.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 124.44661 |
| Minimum | 1 |
|---|---|
| Maximum | 9222.5029 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1.3333333 |
| Q1 | 6 |
| median | 22.655934 |
| Q3 | 88.32619 |
| 95-th percentile | 522.78653 |
| Maximum | 9222.5029 |
| Range | 9221.5029 |
| Interquartile range (IQR) | 82.32619 |
Descriptive statistics
| Standard deviation | 399.72429 |
|---|---|
| Coefficient of variation (CV) | 3.2120142 |
| Kurtosis | 176.86215 |
| Mean | 124.44661 |
| Median Absolute Deviation (MAD) | 20.155934 |
| Skewness | 10.723324 |
| Sum | 521431.31 |
| Variance | 159779.51 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 182 | 4.3% |
| 2 | 124 | 3.0% |
| 3 | 87 | 2.1% |
| 1.5 | 51 | 1.2% |
| 4 | 44 | 1.1% |
| 2.5 | 41 | 1.0% |
| 3.5 | 36 | 0.9% |
| 5 | 23 | 0.5% |
| 6 | 20 | 0.5% |
| 1.333333333 | 20 | 0.5% |
| Other values (3035) | 3562 |
| Value | Count | Frequency (%) |
| 1 | 182 | |
| 1.125 | 2 | < 0.1% |
| 1.142857143 | 2 | < 0.1% |
| 1.157894737 | 2 | < 0.1% |
| 1.166666667 | 3 | 0.1% |
| 1.2 | 3 | 0.1% |
| 1.25 | 8 | 0.2% |
| 1.272727273 | 1 | < 0.1% |
| 1.285714286 | 1 | < 0.1% |
| 1.333333333 | 20 | 0.5% |
| Value | Count | Frequency (%) |
| 9222.502852 | 1 | |
| 9110.311291 | 1 | |
| 6992.023549 | 1 | |
| 5652.101169 | 1 | |
| 5143.737089 | 1 | |
| 3882.088258 | 1 | |
| 3719.079594 | 1 | |
| 3710.170501 | 1 | |
| 3643.383152 | 1 | |
| 3587.347869 | 1 |
subtokenization_indicator_min
Real number (ℝ)
High correlation 
| Distinct | 100 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.2090898 |
| Minimum | 1 |
|---|---|
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1.1666667 |
| 95-th percentile | 2 |
| Maximum | 10 |
| Range | 9 |
| Interquartile range (IQR) | 0.16666667 |
Descriptive statistics
| Standard deviation | 0.53483173 |
|---|---|
| Coefficient of variation (CV) | 0.44234245 |
| Kurtosis | 49.469769 |
| Mean | 1.2090898 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.4195829 |
| Sum | 5066.0861 |
| Variance | 0.28604498 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 3099 | |
| 1.5 | 176 | 4.2% |
| 2 | 154 | 3.7% |
| 1.333333333 | 128 | 3.1% |
| 1.25 | 80 | 1.9% |
| 1.666666667 | 56 | 1.3% |
| 3 | 51 | 1.2% |
| 1.75 | 38 | 0.9% |
| 1.2 | 34 | 0.8% |
| 2.5 | 28 | 0.7% |
| Other values (90) | 346 | 8.3% |
| Value | Count | Frequency (%) |
| 1 | 3099 | |
| 1.058823529 | 1 | < 0.1% |
| 1.071428571 | 1 | < 0.1% |
| 1.076923077 | 1 | < 0.1% |
| 1.083333333 | 1 | < 0.1% |
| 1.090909091 | 2 | < 0.1% |
| 1.1 | 1 | < 0.1% |
| 1.111111111 | 2 | < 0.1% |
| 1.125 | 10 | 0.2% |
| 1.142857143 | 13 | 0.3% |
| Value | Count | Frequency (%) |
| 10 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 7 | 3 | |
| 6.5 | 1 | < 0.1% |
| 6.307692308 | 1 | < 0.1% |
| 6 | 3 | |
| 5.4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 4.666666667 | 1 | < 0.1% |
| 4.333333333 | 1 | < 0.1% |
subtokenization_indicator_mean
Real number (ℝ)
High correlation 
| Distinct | 3058 |
|---|---|
| Distinct (%) | 73.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8574769 |
| Minimum | 1 |
|---|---|
| Maximum | 10.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1.5 |
| median | 1.7524351 |
| Q3 | 2.0586465 |
| 95-th percentile | 2.9238435 |
| Maximum | 10.5 |
| Range | 9.5 |
| Interquartile range (IQR) | 0.55864652 |
Descriptive statistics
| Standard deviation | 0.65080889 |
|---|---|
| Coefficient of variation (CV) | 0.35037254 |
| Kurtosis | 27.167124 |
| Mean | 1.8574769 |
| Median Absolute Deviation (MAD) | 0.26732972 |
| Skewness | 3.5414724 |
| Sum | 7782.8281 |
| Variance | 0.42355222 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 257 | 6.1% |
| 1.5 | 104 | 2.5% |
| 2 | 103 | 2.5% |
| 3 | 46 | 1.1% |
| 1.25 | 39 | 0.9% |
| 1.75 | 38 | 0.9% |
| 1.333333333 | 38 | 0.9% |
| 2.5 | 32 | 0.8% |
| 1.666666667 | 25 | 0.6% |
| 2.25 | 15 | 0.4% |
| Other values (3048) | 3493 |
| Value | Count | Frequency (%) |
| 1 | 257 | |
| 1.0125 | 1 | < 0.1% |
| 1.047619048 | 1 | < 0.1% |
| 1.055555556 | 1 | < 0.1% |
| 1.058080808 | 1 | < 0.1% |
| 1.060606061 | 1 | < 0.1% |
| 1.066666667 | 1 | < 0.1% |
| 1.071428571 | 1 | < 0.1% |
| 1.075 | 1 | < 0.1% |
| 1.083333333 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 10.5 | 1 | < 0.1% |
| 9.2 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 8.166666667 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 7.944444444 | 1 | < 0.1% |
| 7 | 3 | |
| 6.5 | 2 | |
| 6.307692308 | 1 | < 0.1% |
| 6 | 3 |
subtokenization_indicator_median
Real number (ℝ)
High correlation 
| Distinct | 408 |
|---|---|
| Distinct (%) | 9.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.706521 |
| Minimum | 1 |
|---|---|
| Maximum | 10.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1.3333333 |
| median | 1.5833333 |
| Q3 | 2 |
| 95-th percentile | 2.7652778 |
| Maximum | 10.5 |
| Range | 9.5 |
| Interquartile range (IQR) | 0.66666667 |
Descriptive statistics
| Standard deviation | 0.63635088 |
|---|---|
| Coefficient of variation (CV) | 0.37289367 |
| Kurtosis | 29.009321 |
| Mean | 1.706521 |
| Median Absolute Deviation (MAD) | 0.25 |
| Skewness | 3.7469941 |
| Sum | 7150.3229 |
| Variance | 0.40494244 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.5 | 649 | 15.5% |
| 1 | 455 | 10.9% |
| 2 | 439 | 10.5% |
| 1.666666667 | 253 | 6.0% |
| 1.333333333 | 223 | 5.3% |
| 1.75 | 177 | 4.2% |
| 1.25 | 124 | 3.0% |
| 2.5 | 83 | 2.0% |
| 1.6 | 81 | 1.9% |
| 3 | 79 | 1.9% |
| Other values (398) | 1627 |
| Value | Count | Frequency (%) |
| 1 | 455 | |
| 1.038461538 | 1 | < 0.1% |
| 1.045454545 | 1 | < 0.1% |
| 1.05 | 1 | < 0.1% |
| 1.055555556 | 3 | 0.1% |
| 1.0625 | 1 | < 0.1% |
| 1.071428571 | 3 | 0.1% |
| 1.083333333 | 6 | 0.1% |
| 1.090909091 | 1 | < 0.1% |
| 1.1 | 10 | 0.2% |
| Value | Count | Frequency (%) |
| 10.5 | 1 | < 0.1% |
| 8.5 | 2 | < 0.1% |
| 8.166666667 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 7 | 3 | |
| 6.5 | 2 | < 0.1% |
| 6.307692308 | 1 | < 0.1% |
| 6 | 5 | |
| 5.5 | 1 | < 0.1% |
| 5.4 | 1 | < 0.1% |
subtokenization_indicator_max
Real number (ℝ)
High correlation 
| Distinct | 239 |
|---|---|
| Distinct (%) | 5.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.2938163 |
| Minimum | 1 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2.1428571 |
| median | 3.25 |
| Q3 | 5 |
| 95-th percentile | 11 |
| Maximum | 59 |
| Range | 58 |
| Interquartile range (IQR) | 2.8571429 |
Descriptive statistics
| Standard deviation | 3.711385 |
|---|---|
| Coefficient of variation (CV) | 0.86435579 |
| Kurtosis | 22.677383 |
| Mean | 4.2938163 |
| Median Absolute Deviation (MAD) | 1.25 |
| Skewness | 3.5303829 |
| Sum | 17991.09 |
| Variance | 13.774379 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 420 | 10.0% |
| 2 | 337 | 8.0% |
| 4 | 334 | 8.0% |
| 1 | 257 | 6.1% |
| 5 | 198 | 4.7% |
| 2.5 | 193 | 4.6% |
| 3.5 | 151 | 3.6% |
| 6 | 147 | 3.5% |
| 1.5 | 132 | 3.2% |
| 7 | 102 | 2.4% |
| Other values (229) | 1919 |
| Value | Count | Frequency (%) |
| 1 | 257 | |
| 1.1 | 1 | < 0.1% |
| 1.125 | 2 | < 0.1% |
| 1.142857143 | 3 | 0.1% |
| 1.157894737 | 2 | < 0.1% |
| 1.166666667 | 4 | 0.1% |
| 1.2 | 7 | 0.2% |
| 1.25 | 16 | 0.4% |
| 1.272727273 | 2 | < 0.1% |
| 1.285714286 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 59 | 1 | < 0.1% |
| 38 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 33 | 1 | < 0.1% |
| 32 | 1 | < 0.1% |
| 30 | 1 | < 0.1% |
| 27 | 3 | |
| 26 | 3 | |
| 25 | 4 | |
| 24 | 3 |
subtokenization_indicator_std
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 3290 |
|---|---|
| Distinct (%) | 90.1% |
| Missing | 540 |
| Missing (%) | 12.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.80102323 |
| Minimum | 0 |
|---|---|
| Maximum | 9.8994949 |
| Zeros | 106 |
| Zeros (%) | 2.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.1767767 |
| Q1 | 0.47434693 |
| median | 0.70001689 |
| Q3 | 0.95482551 |
| 95-th percentile | 1.714059 |
| Maximum | 9.8994949 |
| Range | 9.8994949 |
| Interquartile range (IQR) | 0.48047858 |
Descriptive statistics
| Standard deviation | 0.63966077 |
|---|---|
| Coefficient of variation (CV) | 0.79855458 |
| Kurtosis | 43.937135 |
| Mean | 0.80102323 |
| Median Absolute Deviation (MAD) | 0.23795869 |
| Skewness | 4.8950393 |
| Sum | 2923.7348 |
| Variance | 0.40916591 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 106 | 2.5% |
| 0.3535533906 | 40 | 1.0% |
| 0.7071067812 | 34 | 0.8% |
| 0.1767766953 | 15 | 0.4% |
| 1.060660172 | 10 | 0.2% |
| 0.5773502692 | 10 | 0.2% |
| 1.414213562 | 9 | 0.2% |
| 0.2886751346 | 9 | 0.2% |
| 0.5 | 9 | 0.2% |
| 0.2886751346 | 7 | 0.2% |
| Other values (3280) | 3401 | |
| (Missing) | 540 | 12.9% |
| Value | Count | Frequency (%) |
| 0 | 106 | |
| 0.03535533906 | 1 | < 0.1% |
| 0.03936479108 | 1 | < 0.1% |
| 0.04040610178 | 1 | < 0.1% |
| 0.04123930494 | 1 | < 0.1% |
| 0.04614625211 | 1 | < 0.1% |
| 0.04714045208 | 1 | < 0.1% |
| 0.04714045208 | 1 | < 0.1% |
| 0.05773502692 | 2 | < 0.1% |
| 0.0589255651 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9.899494937 | 1 | |
| 8.718328926 | 1 | |
| 8.249579114 | 1 | |
| 8.030236228 | 1 | |
| 7.653430603 | 1 | |
| 7.334150943 | 1 | |
| 6.363961031 | 1 | |
| 6.128506881 | 1 | |
| 5.880944501 | 1 | |
| 5.876741066 | 1 |
cosine_sim_gd_vs_hs_text_sum
Real number (ℝ)
High correlation  Zeros 
| Distinct | 3961 |
|---|---|
| Distinct (%) | 94.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.302241 |
| Minimum | 0 |
|---|---|
| Maximum | 3292.9604 |
| Zeros | 230 |
| Zeros (%) | 5.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2.2520665 |
| median | 9.0587955 |
| Q3 | 36.718788 |
| 95-th percentile | 220.3496 |
| Maximum | 3292.9604 |
| Range | 3292.9604 |
| Interquartile range (IQR) | 34.466722 |
Descriptive statistics
| Standard deviation | 150.5106 |
|---|---|
| Coefficient of variation (CV) | 2.9921252 |
| Kurtosis | 121.04629 |
| Mean | 50.302241 |
| Median Absolute Deviation (MAD) | 8.2062452 |
| Skewness | 8.959491 |
| Sum | 210766.39 |
| Variance | 22653.441 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 230 | 5.5% |
| 0.8601569533 | 1 | < 0.1% |
| 9.849659145 | 1 | < 0.1% |
| 1.614173234 | 1 | < 0.1% |
| 121.2259753 | 1 | < 0.1% |
| 62.14456546 | 1 | < 0.1% |
| 2.743227422 | 1 | < 0.1% |
| 21.21379292 | 1 | < 0.1% |
| 4.493902206 | 1 | < 0.1% |
| 0.8223311305 | 1 | < 0.1% |
| Other values (3951) | 3951 |
| Value | Count | Frequency (%) |
| 0 | 230 | |
| 0.6787118316 | 1 | < 0.1% |
| 0.6878749132 | 1 | < 0.1% |
| 0.6927850246 | 1 | < 0.1% |
| 0.7023397684 | 1 | < 0.1% |
| 0.7085582018 | 1 | < 0.1% |
| 0.7114555836 | 1 | < 0.1% |
| 0.7133231163 | 1 | < 0.1% |
| 0.7178073525 | 1 | < 0.1% |
| 0.7198221684 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 3292.960431 | 1 | |
| 2610.64089 | 1 | |
| 2369.143401 | 1 | |
| 2105.748377 | 1 | |
| 1905.001764 | 1 | |
| 1681.023988 | 1 | |
| 1564.106728 | 1 | |
| 1549.153605 | 1 | |
| 1504.432459 | 1 | |
| 1464.516026 | 1 |
cosine_sim_gd_vs_hs_text_min
Real number (ℝ)
High correlation  Missing 
| Distinct | 3957 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 230 |
| Missing (%) | 5.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.78660514 |
| Minimum | 0.50259566 |
|---|---|
| Maximum | 0.96980786 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0.50259566 |
|---|---|
| 5-th percentile | 0.69116185 |
| Q1 | 0.74834664 |
| median | 0.78297687 |
| Q3 | 0.82530168 |
| 95-th percentile | 0.89048727 |
| Maximum | 0.96980786 |
| Range | 0.4672122 |
| Interquartile range (IQR) | 0.076955035 |
Descriptive statistics
| Standard deviation | 0.059727209 |
|---|---|
| Coefficient of variation (CV) | 0.075930357 |
| Kurtosis | 0.26811685 |
| Mean | 0.78660514 |
| Median Absolute Deviation (MAD) | 0.037771702 |
| Skewness | -0.0039663253 |
| Sum | 3114.9563 |
| Variance | 0.0035673395 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.8899133205 | 2 | < 0.1% |
| 0.828234911 | 2 | < 0.1% |
| 0.836297214 | 2 | < 0.1% |
| 0.8760373592 | 1 | < 0.1% |
| 0.8885804415 | 1 | < 0.1% |
| 0.8473905921 | 1 | < 0.1% |
| 0.7047611475 | 1 | < 0.1% |
| 0.8611924648 | 1 | < 0.1% |
| 0.7250263691 | 1 | < 0.1% |
| 0.7615195513 | 1 | < 0.1% |
| Other values (3947) | 3947 | |
| (Missing) | 230 | 5.5% |
| Value | Count | Frequency (%) |
| 0.5025956631 | 1 | |
| 0.5658186674 | 1 | |
| 0.5669174194 | 1 | |
| 0.5686885715 | 1 | |
| 0.5748020411 | 1 | |
| 0.5852319002 | 1 | |
| 0.587050736 | 1 | |
| 0.5903577805 | 1 | |
| 0.5908305645 | 1 | |
| 0.5967436433 | 1 |
| Value | Count | Frequency (%) |
| 0.9698078632 | 1 | |
| 0.9648881555 | 1 | |
| 0.9586913586 | 1 | |
| 0.9571498632 | 1 | |
| 0.9557199478 | 1 | |
| 0.9547865391 | 1 | |
| 0.9529821277 | 1 | |
| 0.9505479336 | 1 | |
| 0.9503619671 | 1 | |
| 0.9477579594 | 1 |
cosine_sim_gd_vs_hs_text_mean
Real number (ℝ)
High correlation  Missing 
| Distinct | 3960 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 230 |
| Missing (%) | 5.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.83501187 |
| Minimum | 0.60158304 |
|---|---|
| Maximum | 0.96980786 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0.60158304 |
|---|---|
| 5-th percentile | 0.75920722 |
| Q1 | 0.80768058 |
| median | 0.83575511 |
| Q3 | 0.86522526 |
| 95-th percentile | 0.9077215 |
| Maximum | 0.96980786 |
| Range | 0.36822482 |
| Interquartile range (IQR) | 0.057544687 |
Descriptive statistics
| Standard deviation | 0.045425905 |
|---|---|
| Coefficient of variation (CV) | 0.054401508 |
| Kurtosis | 0.57477478 |
| Mean | 0.83501187 |
| Median Absolute Deviation (MAD) | 0.028919566 |
| Skewness | -0.31314854 |
| Sum | 3306.647 |
| Variance | 0.0020635128 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.8601569533 | 1 | < 0.1% |
| 0.7576660881 | 1 | < 0.1% |
| 0.8070866168 | 1 | < 0.1% |
| 0.8658998238 | 1 | < 0.1% |
| 0.7967251982 | 1 | < 0.1% |
| 0.9144091407 | 1 | < 0.1% |
| 0.8159151123 | 1 | < 0.1% |
| 0.8987804413 | 1 | < 0.1% |
| 0.8223311305 | 1 | < 0.1% |
| 0.9378454685 | 1 | < 0.1% |
| Other values (3950) | 3950 | |
| (Missing) | 230 | 5.5% |
| Value | Count | Frequency (%) |
| 0.6015830406 | 1 | |
| 0.6519760191 | 1 | |
| 0.659269015 | 1 | |
| 0.6608423758 | 1 | |
| 0.6636832133 | 1 | |
| 0.6676561182 | 1 | |
| 0.6722241243 | 1 | |
| 0.6769229695 | 1 | |
| 0.6779694803 | 1 | |
| 0.6787118316 | 1 |
| Value | Count | Frequency (%) |
| 0.9698078632 | 1 | |
| 0.9648881555 | 1 | |
| 0.9604424271 | 1 | |
| 0.9586913586 | 1 | |
| 0.9571498632 | 1 | |
| 0.9557199478 | 1 | |
| 0.9547865391 | 1 | |
| 0.9529821277 | 1 | |
| 0.9512586355 | 1 | |
| 0.9505479336 | 1 |
cosine_sim_gd_vs_hs_text_median
Real number (ℝ)
High correlation  Missing 
| Distinct | 3960 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 230 |
| Missing (%) | 5.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.83544312 |
| Minimum | 0.59610879 |
|---|---|
| Maximum | 0.96980786 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0.59610879 |
|---|---|
| 5-th percentile | 0.75779429 |
| Q1 | 0.80637823 |
| median | 0.83579889 |
| Q3 | 0.86729787 |
| 95-th percentile | 0.91101782 |
| Maximum | 0.96980786 |
| Range | 0.37369907 |
| Interquartile range (IQR) | 0.060919642 |
Descriptive statistics
| Standard deviation | 0.047106137 |
|---|---|
| Coefficient of variation (CV) | 0.056384613 |
| Kurtosis | 0.45439295 |
| Mean | 0.83544312 |
| Median Absolute Deviation (MAD) | 0.030300811 |
| Skewness | -0.28549003 |
| Sum | 3308.3547 |
| Variance | 0.0022189881 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.8601569533 | 1 | < 0.1% |
| 0.7560578585 | 1 | < 0.1% |
| 0.8070866168 | 1 | < 0.1% |
| 0.8649627566 | 1 | < 0.1% |
| 0.7968799174 | 1 | < 0.1% |
| 0.9191901684 | 1 | < 0.1% |
| 0.8134569228 | 1 | < 0.1% |
| 0.8949782848 | 1 | < 0.1% |
| 0.8223311305 | 1 | < 0.1% |
| 0.9378454685 | 1 | < 0.1% |
| Other values (3950) | 3950 | |
| (Missing) | 230 | 5.5% |
| Value | Count | Frequency (%) |
| 0.5961087942 | 1 | |
| 0.6471940875 | 1 | |
| 0.6519760191 | 1 | |
| 0.6616500616 | 1 | |
| 0.6638229191 | 1 | |
| 0.664309144 | 1 | |
| 0.6664934754 | 1 | |
| 0.6709260941 | 1 | |
| 0.6767831147 | 1 | |
| 0.6787118316 | 1 |
| Value | Count | Frequency (%) |
| 0.9698078632 | 1 | |
| 0.9648881555 | 1 | |
| 0.9644202292 | 1 | |
| 0.9586913586 | 1 | |
| 0.9571498632 | 1 | |
| 0.9557199478 | 1 | |
| 0.9547865391 | 1 | |
| 0.9529821277 | 1 | |
| 0.9510700703 | 1 | |
| 0.9505479336 | 1 |
cosine_sim_gd_vs_hs_text_max
Real number (ℝ)
High correlation  Missing 
| Distinct | 3951 |
|---|---|
| Distinct (%) | 99.8% |
| Missing | 230 |
| Missing (%) | 5.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.88693692 |
| Minimum | 0.66693276 |
|---|---|
| Maximum | 0.98533964 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0.66693276 |
|---|---|
| 5-th percentile | 0.79721308 |
| Q1 | 0.85838361 |
| median | 0.89440507 |
| Q3 | 0.92168801 |
| 95-th percentile | 0.95737314 |
| Maximum | 0.98533964 |
| Range | 0.31840688 |
| Interquartile range (IQR) | 0.063304394 |
Descriptive statistics
| Standard deviation | 0.048990507 |
|---|---|
| Coefficient of variation (CV) | 0.055235616 |
| Kurtosis | 0.57578118 |
| Mean | 0.88693692 |
| Median Absolute Deviation (MAD) | 0.031057239 |
| Skewness | -0.71148707 |
| Sum | 3512.2702 |
| Variance | 0.0024000698 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.9547865391 | 2 | < 0.1% |
| 0.8526964188 | 2 | < 0.1% |
| 0.9135311842 | 2 | < 0.1% |
| 0.9143345356 | 2 | < 0.1% |
| 0.9568815827 | 2 | < 0.1% |
| 0.8908780217 | 2 | < 0.1% |
| 0.9003762007 | 2 | < 0.1% |
| 0.8929286003 | 2 | < 0.1% |
| 0.8993106484 | 2 | < 0.1% |
| 0.8837827444 | 1 | < 0.1% |
| Other values (3941) | 3941 | |
| (Missing) | 230 | 5.5% |
| Value | Count | Frequency (%) |
| 0.6669327617 | 1 | |
| 0.6787118316 | 1 | |
| 0.6878749132 | 1 | |
| 0.6927850246 | 1 | |
| 0.7023397684 | 1 | |
| 0.7065677047 | 1 | |
| 0.7085582018 | 1 | |
| 0.7088137865 | 1 | |
| 0.7114555836 | 1 | |
| 0.7133231163 | 1 |
| Value | Count | Frequency (%) |
| 0.9853396416 | 1 | |
| 0.9845205545 | 1 | |
| 0.9842960835 | 1 | |
| 0.9842685461 | 1 | |
| 0.9842452407 | 1 | |
| 0.9826465249 | 1 | |
| 0.9825786352 | 1 | |
| 0.982221365 | 1 | |
| 0.9819263816 | 1 | |
| 0.9818922281 | 1 |
cosine_sim_gd_vs_hs_text_std
Real number (ℝ)
Missing 
| Distinct | 3452 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 738 |
| Missing (%) | 17.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.030801574 |
| Minimum | 0.00047849317 |
|---|---|
| Maximum | 0.12595088 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0.00047849317 |
|---|---|
| 5-th percentile | 0.011317525 |
| Q1 | 0.023276149 |
| median | 0.029994322 |
| Q3 | 0.036959365 |
| 95-th percentile | 0.051746202 |
| Maximum | 0.12595088 |
| Range | 0.12547239 |
| Interquartile range (IQR) | 0.013683216 |
Descriptive statistics
| Standard deviation | 0.01268185 |
|---|---|
| Coefficient of variation (CV) | 0.41172732 |
| Kurtosis | 3.2722628 |
| Mean | 0.030801574 |
| Median Absolute Deviation (MAD) | 0.0068340515 |
| Skewness | 0.87867178 |
| Sum | 106.32703 |
| Variance | 0.00016082931 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.02765643764 | 1 | < 0.1% |
| 0.04118186568 | 1 | < 0.1% |
| 0.04183608434 | 1 | < 0.1% |
| 0.00931358171 | 1 | < 0.1% |
| 0.02441861613 | 1 | < 0.1% |
| 0.02533055098 | 1 | < 0.1% |
| 0.005070434463 | 1 | < 0.1% |
| 0.01084980618 | 1 | < 0.1% |
| 0.02489916157 | 1 | < 0.1% |
| 0.04054674341 | 1 | < 0.1% |
| Other values (3442) | 3442 | |
| (Missing) | 738 | 17.6% |
| Value | Count | Frequency (%) |
| 0.0004784931711 | 1 | |
| 0.0006112978908 | 1 | |
| 0.0006257964067 | 1 | |
| 0.0006765412123 | 1 | |
| 0.000708067055 | 1 | |
| 0.0007940887728 | 1 | |
| 0.0008529679202 | 1 | |
| 0.001015949783 | 1 | |
| 0.001178576934 | 1 | |
| 0.001218296803 | 1 |
| Value | Count | Frequency (%) |
| 0.1259508834 | 1 | |
| 0.1160734264 | 1 | |
| 0.1007662694 | 1 | |
| 0.09964867902 | 1 | |
| 0.08998957681 | 1 | |
| 0.08872843057 | 1 | |
| 0.08651958941 | 1 | |
| 0.08572085898 | 1 | |
| 0.08256047565 | 1 | |
| 0.0806344574 | 1 |
Interactions
Correlations
| GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_words_sum | HS06_count | cosine_sim_gd_vs_hs_text_max | cosine_sim_gd_vs_hs_text_mean | cosine_sim_gd_vs_hs_text_median | cosine_sim_gd_vs_hs_text_min | cosine_sim_gd_vs_hs_text_std | cosine_sim_gd_vs_hs_text_sum | subtokenization_indicator_max | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_min | subtokenization_indicator_std | subtokenization_indicator_sum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GOODS_DESCRIPTION_len_chars_max | 1.000 | 0.716 | 0.520 | -0.361 | 0.842 | 0.862 | 0.963 | 0.706 | 0.504 | -0.335 | 0.804 | 0.854 | 0.789 | 0.510 | -0.019 | -0.020 | -0.379 | 0.134 | 0.718 | 0.630 | 0.202 | 0.102 | -0.357 | 0.331 | 0.788 |
| GOODS_DESCRIPTION_len_chars_mean | 0.716 | 1.000 | 0.909 | 0.165 | 0.728 | 0.501 | 0.685 | 0.929 | 0.819 | 0.147 | 0.714 | 0.486 | 0.351 | 0.372 | 0.231 | 0.236 | -0.010 | -0.004 | 0.340 | 0.323 | 0.255 | 0.231 | -0.025 | 0.169 | 0.371 |
| GOODS_DESCRIPTION_len_chars_median | 0.520 | 0.909 | 1.000 | 0.245 | 0.462 | 0.393 | 0.503 | 0.840 | 0.875 | 0.227 | 0.473 | 0.379 | 0.249 | 0.294 | 0.262 | 0.276 | 0.061 | -0.066 | 0.249 | 0.243 | 0.253 | 0.257 | 0.055 | 0.115 | 0.274 |
| GOODS_DESCRIPTION_len_chars_min | -0.361 | 0.165 | 0.245 | 1.000 | -0.182 | -0.552 | -0.380 | 0.101 | 0.193 | 0.837 | -0.166 | -0.562 | -0.641 | -0.201 | 0.317 | 0.308 | 0.551 | -0.237 | -0.552 | -0.481 | -0.033 | 0.068 | 0.461 | -0.278 | -0.623 |
| GOODS_DESCRIPTION_len_chars_std | 0.842 | 0.728 | 0.462 | -0.182 | 1.000 | 0.506 | 0.787 | 0.684 | 0.414 | -0.169 | 0.913 | 0.495 | 0.402 | 0.342 | 0.058 | 0.053 | -0.157 | 0.198 | 0.368 | 0.339 | 0.158 | 0.106 | -0.156 | 0.244 | 0.407 |
| GOODS_DESCRIPTION_len_chars_sum | 0.862 | 0.501 | 0.393 | -0.552 | 0.506 | 1.000 | 0.864 | 0.526 | 0.402 | -0.518 | 0.516 | 0.998 | 0.984 | 0.501 | -0.118 | -0.109 | -0.521 | 0.106 | 0.887 | 0.753 | 0.195 | 0.074 | -0.490 | 0.394 | 0.977 |
| GOODS_DESCRIPTION_len_words_max | 0.963 | 0.685 | 0.503 | -0.380 | 0.787 | 0.864 | 1.000 | 0.731 | 0.521 | -0.342 | 0.845 | 0.867 | 0.798 | 0.479 | -0.040 | -0.038 | -0.393 | 0.110 | 0.722 | 0.603 | 0.155 | 0.061 | -0.381 | 0.298 | 0.788 |
| GOODS_DESCRIPTION_len_words_mean | 0.706 | 0.929 | 0.840 | 0.101 | 0.684 | 0.526 | 0.731 | 1.000 | 0.886 | 0.147 | 0.750 | 0.533 | 0.390 | 0.326 | 0.160 | 0.163 | -0.064 | -0.032 | 0.367 | 0.261 | 0.121 | 0.114 | -0.100 | 0.067 | 0.386 |
| GOODS_DESCRIPTION_len_words_median | 0.504 | 0.819 | 0.875 | 0.193 | 0.414 | 0.402 | 0.521 | 0.886 | 1.000 | 0.236 | 0.458 | 0.409 | 0.273 | 0.240 | 0.179 | 0.187 | 0.009 | -0.091 | 0.263 | 0.161 | 0.101 | 0.121 | -0.021 | -0.002 | 0.271 |
| GOODS_DESCRIPTION_len_words_min | -0.335 | 0.147 | 0.227 | 0.837 | -0.169 | -0.518 | -0.342 | 0.147 | 0.236 | 1.000 | -0.180 | -0.514 | -0.601 | -0.213 | 0.264 | 0.254 | 0.494 | -0.222 | -0.522 | -0.568 | -0.187 | -0.058 | 0.368 | -0.402 | -0.612 |
| GOODS_DESCRIPTION_len_words_std | 0.804 | 0.714 | 0.473 | -0.166 | 0.913 | 0.516 | 0.845 | 0.750 | 0.458 | -0.180 | 1.000 | 0.519 | 0.414 | 0.305 | 0.048 | 0.045 | -0.153 | 0.140 | 0.377 | 0.324 | 0.134 | 0.087 | -0.149 | 0.216 | 0.413 |
| GOODS_DESCRIPTION_len_words_sum | 0.854 | 0.486 | 0.379 | -0.562 | 0.495 | 0.998 | 0.867 | 0.533 | 0.409 | -0.514 | 0.519 | 1.000 | 0.985 | 0.489 | -0.129 | -0.122 | -0.527 | 0.100 | 0.886 | 0.734 | 0.167 | 0.050 | -0.504 | 0.372 | 0.972 |
| HS06_count | 0.789 | 0.351 | 0.249 | -0.641 | 0.402 | 0.984 | 0.798 | 0.390 | 0.273 | -0.601 | 0.414 | 0.985 | 1.000 | 0.467 | -0.175 | -0.167 | -0.568 | 0.120 | 0.897 | 0.755 | 0.167 | 0.038 | -0.536 | 0.398 | 0.988 |
| cosine_sim_gd_vs_hs_text_max | 0.510 | 0.372 | 0.294 | -0.201 | 0.342 | 0.501 | 0.479 | 0.326 | 0.240 | -0.213 | 0.305 | 0.489 | 0.467 | 1.000 | 0.578 | 0.554 | 0.171 | 0.190 | 0.502 | 0.416 | 0.227 | 0.185 | -0.115 | 0.200 | 0.484 |
| cosine_sim_gd_vs_hs_text_mean | -0.019 | 0.231 | 0.262 | 0.317 | 0.058 | -0.118 | -0.040 | 0.160 | 0.179 | 0.264 | 0.048 | -0.129 | -0.175 | 0.578 | 1.000 | 0.989 | 0.824 | -0.263 | -0.127 | -0.069 | 0.155 | 0.199 | 0.258 | -0.036 | -0.147 |
| cosine_sim_gd_vs_hs_text_median | -0.020 | 0.236 | 0.276 | 0.308 | 0.053 | -0.109 | -0.038 | 0.163 | 0.187 | 0.254 | 0.045 | -0.122 | -0.167 | 0.554 | 0.989 | 1.000 | 0.797 | -0.256 | -0.119 | -0.061 | 0.164 | 0.206 | 0.254 | -0.028 | -0.138 |
| cosine_sim_gd_vs_hs_text_min | -0.379 | -0.010 | 0.061 | 0.551 | -0.157 | -0.521 | -0.393 | -0.064 | 0.009 | 0.494 | -0.153 | -0.527 | -0.568 | 0.171 | 0.824 | 0.797 | 1.000 | -0.494 | -0.532 | -0.395 | 0.004 | 0.093 | 0.426 | -0.192 | -0.545 |
| cosine_sim_gd_vs_hs_text_std | 0.134 | -0.004 | -0.066 | -0.237 | 0.198 | 0.106 | 0.110 | -0.032 | -0.091 | -0.222 | 0.140 | 0.100 | 0.120 | 0.190 | -0.263 | -0.256 | -0.494 | 1.000 | 0.111 | 0.103 | 0.044 | 0.028 | -0.109 | 0.098 | 0.122 |
| cosine_sim_gd_vs_hs_text_sum | 0.718 | 0.340 | 0.249 | -0.552 | 0.368 | 0.887 | 0.722 | 0.367 | 0.263 | -0.522 | 0.377 | 0.886 | 0.897 | 0.502 | -0.127 | -0.119 | -0.532 | 0.111 | 1.000 | 0.685 | 0.171 | 0.057 | -0.471 | 0.366 | 0.889 |
| subtokenization_indicator_max | 0.630 | 0.323 | 0.243 | -0.481 | 0.339 | 0.753 | 0.603 | 0.261 | 0.161 | -0.568 | 0.324 | 0.734 | 0.755 | 0.416 | -0.069 | -0.061 | -0.395 | 0.103 | 0.685 | 1.000 | 0.620 | 0.390 | -0.212 | 0.863 | 0.827 |
| subtokenization_indicator_mean | 0.202 | 0.255 | 0.253 | -0.033 | 0.158 | 0.195 | 0.155 | 0.121 | 0.101 | -0.187 | 0.134 | 0.167 | 0.167 | 0.227 | 0.155 | 0.164 | 0.004 | 0.044 | 0.171 | 0.620 | 1.000 | 0.906 | 0.400 | 0.652 | 0.303 |
| subtokenization_indicator_median | 0.102 | 0.231 | 0.257 | 0.068 | 0.106 | 0.074 | 0.061 | 0.114 | 0.121 | -0.058 | 0.087 | 0.050 | 0.038 | 0.185 | 0.199 | 0.206 | 0.093 | 0.028 | 0.057 | 0.390 | 0.906 | 1.000 | 0.495 | 0.366 | 0.166 |
| subtokenization_indicator_min | -0.357 | -0.025 | 0.055 | 0.461 | -0.156 | -0.490 | -0.381 | -0.100 | -0.021 | 0.368 | -0.149 | -0.504 | -0.536 | -0.115 | 0.258 | 0.254 | 0.426 | -0.109 | -0.471 | -0.212 | 0.400 | 0.495 | 1.000 | -0.150 | -0.453 |
| subtokenization_indicator_std | 0.331 | 0.169 | 0.115 | -0.278 | 0.244 | 0.394 | 0.298 | 0.067 | -0.002 | -0.402 | 0.216 | 0.372 | 0.398 | 0.200 | -0.036 | -0.028 | -0.192 | 0.098 | 0.366 | 0.863 | 0.652 | 0.366 | -0.150 | 1.000 | 0.487 |
| subtokenization_indicator_sum | 0.788 | 0.371 | 0.274 | -0.623 | 0.407 | 0.977 | 0.788 | 0.386 | 0.271 | -0.612 | 0.413 | 0.972 | 0.988 | 0.484 | -0.147 | -0.138 | -0.545 | 0.122 | 0.889 | 0.827 | 0.303 | 0.166 | -0.453 | 0.487 | 1.000 |
Missing values
Sample
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | cosine_sim_gd_vs_hs_text_sum | cosine_sim_gd_vs_hs_text_min | cosine_sim_gd_vs_hs_text_mean | cosine_sim_gd_vs_hs_text_median | cosine_sim_gd_vs_hs_text_max | cosine_sim_gd_vs_hs_text_std | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HS06 | |||||||||||||||||||||||||
| 10121 | 5 | 30 | 4 | 6.0 | 6.0 | 7 | 1.224745 | 172 | 24 | 34.4 | 33.0 | 46 | 8.203658 | 6.738095 | 1.0 | 1.347619 | 1.285714 | 1.666667 | 0.251751 | 4.493902 | 0.889913 | 0.898780 | 0.894978 | 0.912786 | 0.009314 |
| 10130 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 16 | 16 | 16.0 | 16.0 | 16 | NaN | 1.500000 | 1.5 | 1.500000 | 1.500000 | 1.500000 | NaN | 0.822331 | 0.822331 | 0.822331 | 0.822331 | 0.822331 | NaN |
| 10190 | 1 | 3 | 3 | 3.0 | 3.0 | 3 | NaN | 15 | 15 | 15.0 | 15.0 | 15 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.866746 | 0.866746 | 0.866746 | 0.866746 | 0.866746 | NaN |
| 10221 | 5 | 20 | 2 | 4.0 | 3.0 | 7 | 2.345208 | 103 | 8 | 20.6 | 19.0 | 34 | 11.631853 | 7.000000 | 1.0 | 1.400000 | 1.333333 | 2.000000 | 0.434613 | 4.574999 | 0.876037 | 0.915000 | 0.915777 | 0.937644 | 0.024419 |
| 10229 | 2 | 8 | 3 | 4.0 | 4.0 | 5 | 1.414214 | 47 | 22 | 23.5 | 23.5 | 25 | 2.121320 | 3.333333 | 1.0 | 1.666667 | 1.666667 | 2.333333 | 0.942809 | 1.812984 | 0.888580 | 0.906492 | 0.906492 | 0.924403 | 0.025331 |
| 10231 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 9 | 9 | 9.0 | 9.0 | 9 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.847391 | 0.847391 | 0.847391 | 0.847391 | 0.847391 | NaN |
| 10290 | 2 | 5 | 2 | 2.5 | 2.5 | 3 | 0.707107 | 20 | 8 | 10.0 | 10.0 | 12 | 2.828427 | 2.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 1.875691 | 0.934260 | 0.937845 | 0.937845 | 0.941431 | 0.005070 |
| 10310 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 16 | 16 | 16.0 | 16.0 | 16 | NaN | 2.000000 | 2.0 | 2.000000 | 2.000000 | 2.000000 | NaN | 0.897062 | 0.897062 | 0.897062 | 0.897062 | 0.897062 | NaN |
| 10392 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 4 | 4 | 4.0 | 4.0 | 4 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.860157 | 0.860157 | 0.860157 | 0.860157 | 0.860157 | NaN |
| 10420 | 3 | 6 | 1 | 2.0 | 1.0 | 4 | 1.732051 | 33 | 4 | 11.0 | 9.0 | 20 | 8.185353 | 3.500000 | 1.0 | 1.166667 | 1.000000 | 1.500000 | 0.288675 | 2.743227 | 0.901990 | 0.914409 | 0.919190 | 0.922047 | 0.010850 |
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | cosine_sim_gd_vs_hs_text_sum | cosine_sim_gd_vs_hs_text_min | cosine_sim_gd_vs_hs_text_mean | cosine_sim_gd_vs_hs_text_median | cosine_sim_gd_vs_hs_text_max | cosine_sim_gd_vs_hs_text_std | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HS06 | |||||||||||||||||||||||||
| 961700 | 140 | 532 | 1 | 3.800000 | 3.0 | 14 | 1.885957 | 3367 | 3 | 24.050000 | 21.0 | 88 | 12.518375 | 271.683333 | 1.0 | 1.940595 | 1.732143 | 6.333333 | 0.805809 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 961800 | 55 | 149 | 1 | 2.709091 | 2.0 | 6 | 1.242350 | 993 | 5 | 18.054545 | 15.0 | 46 | 8.754720 | 99.133333 | 1.0 | 1.802424 | 1.666667 | 5.000000 | 0.783828 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 961900 | 270 | 1312 | 1 | 4.859259 | 5.0 | 17 | 2.714858 | 7045 | 4 | 26.092593 | 27.5 | 73 | 12.376180 | 485.317482 | 1.0 | 1.797472 | 1.571429 | 5.000000 | 0.687515 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 962000 | 44 | 166 | 1 | 3.772727 | 3.0 | 9 | 1.951310 | 1010 | 6 | 22.954545 | 21.0 | 57 | 11.309503 | 77.407937 | 1.0 | 1.759271 | 1.550000 | 6.500000 | 0.867186 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 970110 | 26 | 68 | 1 | 2.615385 | 3.0 | 5 | 1.267341 | 490 | 6 | 18.846154 | 16.5 | 47 | 10.414192 | 33.833333 | 1.0 | 1.301282 | 1.000000 | 2.800000 | 0.575402 | 21.335309 | 0.733327 | 0.820589 | 0.820477 | 0.883599 | 0.030272 |
| 970190 | 26 | 98 | 1 | 3.769231 | 2.5 | 13 | 3.037205 | 621 | 6 | 23.884615 | 18.0 | 71 | 17.673318 | 37.526496 | 1.0 | 1.443327 | 1.138889 | 4.000000 | 0.708957 | 21.213793 | 0.742387 | 0.815915 | 0.813457 | 0.902661 | 0.041836 |
| 970200 | 1 | 3 | 3 | 3.000000 | 3.0 | 3 | NaN | 14 | 14 | 14.000000 | 14.0 | 14 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 970300 | 31 | 100 | 1 | 3.225806 | 2.0 | 12 | 2.261411 | 672 | 9 | 21.677419 | 17.0 | 74 | 14.246373 | 53.633333 | 1.0 | 1.730108 | 1.500000 | 4.000000 | 0.844376 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 970400 | 6 | 16 | 1 | 2.666667 | 2.5 | 4 | 1.211060 | 100 | 7 | 16.666667 | 14.0 | 30 | 9.025889 | 7.000000 | 1.0 | 1.166667 | 1.000000 | 1.750000 | 0.302765 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
| 970500 | 4 | 11 | 1 | 2.750000 | 2.0 | 6 | 2.217356 | 75 | 8 | 18.750000 | 15.0 | 37 | 12.632630 | 6.666667 | 1.0 | 1.666667 | 1.500000 | 2.666667 | 0.816497 | 0.000000 | NaN | NaN | NaN | NaN | NaN |
Duplicate rows
Most frequently occurring
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | cosine_sim_gd_vs_hs_text_sum | cosine_sim_gd_vs_hs_text_min | cosine_sim_gd_vs_hs_text_mean | cosine_sim_gd_vs_hs_text_median | cosine_sim_gd_vs_hs_text_max | cosine_sim_gd_vs_hs_text_std | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 4 | 4 | 4.0 | 4.0 | 4 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 2 |
| 1 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 9 | 9 | 9.0 | 9.0 | 9 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 2 |
| 2 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 10 | 10 | 10.0 | 10.0 | 10 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 2 |
| 3 | 2 | 5 | 2 | 2.5 | 2.5 | 3 | 0.707107 | 27 | 11 | 13.5 | 13.5 | 16 | 3.535534 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | 2 |